在本文中,我们介绍了McTensor,这是一个基于Pytorch的库,用于为DL培训提供通用和高精度算术。MCTENSOR的使用方式与Pytorch Tensor相同:我们为具有相同的Pytorch接口的MCTENSOR实施了多个基本的,矩阵级计算运算符和NN模块。我们的算法获得了高精度计算,并且还受益于重优化的Pytorch浮点算术算术。我们针对一系列任务评估了针对Pytorch天然算术的mctensor算术,其中使用float16中使用mctensor的模型将与float32或float64精度相匹配或优于pytorch模型。
translated by 谷歌翻译
我们提出了CKAM,周期性内核自适应大都市,该大都市结合了一个周期性的步骤尺寸方案,以控制探索和采样。我们表明,在精心设计的双峰分布中,现有的自适应大都市类型算法将无法融合到真正的后验分布。我们指出,这是因为自适应采样器使用链的过去历史估算局部/全局协方差结构,这将导致自适应算法被困在局部模式下。我们证明CKAM鼓励对后验分布进行探索,并使采样器能够从局部模式中逃脱,同时保持自适应方法的高性能。
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
The Five-hundred-meter Aperture Spherical radio Telescope (FAST) is the world's largest single-dish radio telescope. Its large reflecting surface achieves unprecedented sensitivity but is prone to damage, such as dents and holes, caused by naturally-occurring falling objects. Hence, the timely and accurate detection of surface defects is crucial for FAST's stable operation. Conventional manual inspection involves human inspectors climbing up and examining the large surface visually, a time-consuming and potentially unreliable process. To accelerate the inspection process and increase its accuracy, this work makes the first step towards automating the inspection of FAST by integrating deep-learning techniques with drone technology. First, a drone flies over the surface along a predetermined route. Since surface defects significantly vary in scale and show high inter-class similarity, directly applying existing deep detectors to detect defects on the drone imagery is highly prone to missing and misidentifying defects. As a remedy, we introduce cross-fusion, a dedicated plug-in operation for deep detectors that enables the adaptive fusion of multi-level features in a point-wise selective fashion, depending on local defect patterns. Consequently, strong semantics and fine-grained details are dynamically fused at different positions to support the accurate detection of defects of various scales and types. Our AI-powered drone-based automated inspection is time-efficient, reliable, and has good accessibility, which guarantees the long-term and stable operation of FAST.
translated by 谷歌翻译
具有多传感器的3D对象检测对于自主驾驶和机器人技术的准确可靠感知系统至关重要。现有的3D探测器通过采用两阶段范式来显着提高准确性,这仅依靠激光点云进行3D提案的细化。尽管令人印象深刻,但点云的稀疏性,尤其是对于遥远的点,使得仅激光雷达的完善模块难以准确识别和定位对象。要解决这个问题,我们提出了一种新颖的多模式两阶段方法FusionRcnn,有效,有效地融合了感兴趣区域(ROI)的点云和摄像头图像。 FusionRcnn自适应地整合了LiDAR的稀疏几何信息和统一注意机制中相机的密集纹理信息。具体而言,它首先利用RoiPooling获得具有统一大小的图像集,并通过在ROI提取步骤中的建议中采样原始点来获取点设置;然后利用模式内的自我注意力来增强域特异性特征,此后通过精心设计的跨注意事项融合了来自两种模态的信息。FusionRCNN从根本上是插件,并支持不同的单阶段方法与不同的单阶段方法。几乎没有建筑变化。对Kitti和Waymo基准测试的广泛实验表明,我们的方法显着提高了流行探测器的性能。可取,FusionRCNN在Waymo上的FusionRCNN显着提高了强大的第二基线,而Waymo上的MAP则超过6.14%,并且优于竞争两阶段方法的表现。代码将很快在https://github.com/xxlbigbrother/fusion-rcnn上发布。
translated by 谷歌翻译
知识图嵌入(KGE)的有效性在很大程度上取决于建模固有关系模式和映射属性的能力。但是,现有方法只能以不足的建模能力捕获其中的一些。在这项工作中,我们提出了一个名为House的更强大的KGE框架,该框架涉及基于两种家庭转换的新型参数化:(1)住户旋转以实现建模关系模式的较高能力;(2)处理复杂关系映射属性的住户预测。从理论上讲,房屋能够同时建模关键的关系模式和映射属性。此外,房屋是对现有基于旋转的模型的概括,同时将旋转扩展到高维空间。从经验上讲,House在五个基准数据集上实现了新的最新性能。我们的代码可在https://github.com/anrep/house上找到。
translated by 谷歌翻译
三重态损失是Reid任务中的广泛采用的损失功能,拉动最艰难的正面对关闭并推动最远的负面对。然而,所选样本并不是全球最困难的,但仅在迷你批量中最难影响,这将影响性能。在本报告中,提出了一种硬批次挖掘方法来挖掘全球最难的样本,以使三联体变硬。更具体地,将最相似的类选择为相同的批量,使得可以将类似的类移开。此外,由场景分类器和对抗丢失组成的对抗性场景去除模块用于学习场景不变特征表示。实验在数据集MSMT17上进行,证明了效果,我们的方法超越了所有先前的方法并确定最先进的结果。
translated by 谷歌翻译
缩短采集时间和减少动作伪影是磁共振成像中最重要的两个问题。作为一个有前途的解决方案,已经研究了基于深度学习的高质量MR图像恢复,以产生从缩短采集时间获取的较低分辨率图像的更高分辨率和自由运动伪影图像,而不降低额外的获取时间或修改脉冲序列。然而,仍有许多问题仍然存在,以防止深度学习方法在临床环境中变得实用。具体而言,大多数先前的作品专注于网络模型,但忽略了各种下采样策略对采集时间的影响。此外,长推理时间和高GPU消耗也是瓶颈,以便在诊所部署大部分产品。此外,先验研究采用回顾性运动伪像产生随机运动,导致运动伪影的无法控制的严重程度。更重要的是,医生不确定生成的MR图像是否值得信赖,使诊断困难。为了克服所有这些问题,我们雇用了一个统一的2D深度学习神经网络,用于3D MRI超级分辨率和运动伪影,展示这种框架可以在3D MRI恢复任务中实现更好的性能与最艺术方法的其他状态,并且仍然存在GPU消耗和推理时间明显低,从而更易于部署。我们还基于加速度分析了几种下式采样策略,包括在平面内和穿过平面下采样的多种组合,并开发了一种可控和可量化的运动伪影生成方法。最后,计算并用于估计生成图像的准确性的像素 - 明智的不确定性,提供可靠诊断的附加信息。
translated by 谷歌翻译
通过当地地区的点特征聚合来捕获的细粒度几何是对象识别和场景理解在点云中的关键。然而,现有的卓越点云骨架通常包含最大/平均池用于局部特征聚集,这在很大程度上忽略了点的位置分布,导致细粒结构组装不足。为了缓解这一瓶颈,我们提出了一个有效的替代品,可以使用新颖的图形表示明确地模拟了本地点之间的空间关系,并以位置自适应方式聚合特征,从而实现位置敏感的表示聚合特征。具体而言,Papooling分别由两个关键步骤,图形结构和特征聚合组成,分别负责构造与将中心点连接的边缘与本地区域中的每个相邻点连接的曲线图组成,以将它们的相对位置信息映射到通道 - 明智的细心权重,以及基于通过图形卷积网络(GCN)的生成权重自适应地聚合局部点特征。 Papooling简单而且有效,并且足够灵活,可以随时为PointNet ++和DGCNN等不同的流行律源,作为即插即说运算符。关于各种任务的广泛实验,从3D形状分类,部分分段对场景分割良好的表明,伪装可以显着提高预测准确性,而具有最小的额外计算开销。代码将被释放。
translated by 谷歌翻译
Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted to be on par with traditional U-Net-based methods. We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND). Our improvements achieve competitive results on CIFAR-10, CelebA, LSUN, CUB Bird and large-resolution text-to-image tasks. To the best of our knowledge, we are the first to successfully train a single diffusion model on text-to-image task beyond 64x64 resolution. We hope this will motivate people to rethink the modeling choices and the training pipelines for diffusion-based generative models.
translated by 谷歌翻译